Computer-Assisted Categorization of Patent Documents in the International Patent Classification

نویسندگان

  • C. J. Fall
  • K. Benzineb
  • J. Guyot
  • P. Fiévet
چکیده

The World Intellectual Property Organization is currently developing a system for assisting users in categorizing patent documents in the International Patent Classification (IPC). The system should support the classification of documents in several languages and aims to assist users in locating relevant IPC symbols by providing them with a convenient web-based service. The approach taken for developing such a system relies on powerful machine learning algorithms that are trained on manually classified documents to recognize IPC topics. We detail in-house results of applying a custom-built state-of-the-art computer-assisted categorizer to English, French, Russian, and Germanlanguage patent documents. We find that reliable computer-assisted categorization at IPC subclass level is an achievable goal for the statistical methods employed here. A categorization system suggesting three IPC symbols for each document can predict the main IPC class correctly for around 90% of documents, and the main IPC subclass for about 85% of documents. The accuracy of the system at main group level is enhanced if the user first validates the correct IPC class.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Patent Categorization and Guided Patent Search using IPC as Inspired by MeSH and PubMed

Document search on PubMed, the pre-eminent database for biomedical literature, relies on the annotation of its documents with relevant terms from the Medical Subject Headings ontology (MeSH) for improving recall through query expansion. Patent documents are another important information source, though they are considerably less accessible. One option to expand patent search beyond pure keywords...

متن کامل

Patent document categorization based on semantic structural information

The number of patent documents is currently rising rapidly worldwide, creating the need for an automatic categorization system to replace time-consuming and labor-intensive manual categorization. Because accurate patent classification is crucial to search for relevant existing patents in a certain field, patent categorization is a very important and useful field. As patent documents are structu...

متن کامل

Development of a patent document classification and search platform using a back-propagation network

In order to process large numbers of explicit knowledge documents such as patents in an organized manner, automatic document categorization and search are required. In this paper, we develop a document classification and search methodology based on neural network technology that helps companies manage patent documents more effectively. The classification process begins by extracting key phrases...

متن کامل

Text Categorization for Intellectual Property Comparing Balanced Winnow with SVM on Different Document Representations

This study investigates the effect of training different categorization algorithms on various patent document representations. The automation of knowledge and content management in the intellectual property domain has been experiencing a growing interest in the last decade, since the first patent classification system was presented in 1999 by Larkey [Larkey, 1999]. Typical applications of paten...

متن کامل

Enhancing Patent Expertise through Automatic Matching with Scientific Papers

This paper focuses on a subtask of the QUAERO research program, a major innovating research project related to the automatic processing of multimedia and multilingual content. The objective discussed in this article is to propose a new method for the classification of scientific papers, developed in the context of an international patents classification plan related to the same field. The pract...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003